Goto

Collaborating Authors

 reparameterizing mirror descent



Reparameterizing Mirror Descent as Gradient Descent

Neural Information Processing Systems

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is sparse. We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters. In some cases, the mirror descent reparameterization can be described as training a modified network with standard backpropagation. The reparameterization framework is versatile and covers a wide range of mirror descent updates, even cases where the domain is constrained. Our construction for the reparameterization argument is done for the continuous versions of the updates. Finding general criteria for the discrete versions to closely track their continuous counterparts remains an interesting open problem.


Reparameterizing Mirror Descent as Gradient Descent

Neural Information Processing Systems

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is sparse. We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters. In some cases, the mirror descent reparameterization can be described as training a modified network with standard backpropagation. The reparameterization framework is versatile and covers a wide range of mirror descent updates, even cases where the domain is constrained.


Review for NeurIPS paper: Reparameterizing Mirror Descent as Gradient Descent

Neural Information Processing Systems

Additional Feedback: Suggestions: Lack definition (anything that is not'common knowledge' should be defined and explained before using. Should not let readers guess.) 1. In eq(1), w and L is used without defined. Could first introduce the problem and mention L is loss or the target function, and w is the model parameter. 'coincides with' is not a commonly used, mathematically rigorous and clear expression.


Review for NeurIPS paper: Reparameterizing Mirror Descent as Gradient Descent

Neural Information Processing Systems

The theoretical result of the paper is significant in my opinion. I also agree with the authors that the topic of this paper is at the core of machine learning and thus the paper should be evaluated based on its contributions. The reviewers also adjusted their reviews based on this point. However, I should also mention that the reviewers raised the concern that some of the definitions are omitted and few parts of the paper is not rigorous enough. Therefore, I suggest that the authors take care of such ambiguities in the final version.


Reparameterizing Mirror Descent as Gradient Descent

Neural Information Processing Systems

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is sparse. We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters. In some cases, the mirror descent reparameterization can be described as training a modified network with standard backpropagation. The reparameterization framework is versatile and covers a wide range of mirror descent updates, even cases where the domain is constrained.